{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "# 【图像分类】使用飞桨全流程开发工具PaddleX实现垃圾分类任务"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## 一、前言\n",
    "\n",
    "* 随着人们保护环境的意识逐渐增强，垃圾分类成为每一个人的共识。但是繁杂的垃圾种类太难记了，于是自己想使用PP开发一个模型能替我完成垃圾分类的工具。但是AI Studio实现垃圾分类的项目比比皆是，于是我想到使用PaddleX来完成这个任务。\n",
    "* 本项目粗暴地将垃圾按人们最熟悉的标准分为可回收垃圾、餐余垃圾、有害垃圾和其他垃圾四种，使用集飞桨核心框架、模型库、工具及组件等深度学习开发所需全部能力于一身的PaddleX实现垃圾的图像分类任务。\n",
    "* 本项目的数据集和模型可以根据下面的提示操作换成大家自己的，然后快速实现自己图像分类项目的开发。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## 二、准备工作\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# 下面的命令用于重新运行项目时删除前一次的残留文件\r\n",
    "#!rm -rf output\r\n",
    "#!rm -rf work/*\r\n",
    "#!rm -rf inference_model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "### 安装依赖"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# 从百度镜像安装PaddleX\r\n",
    "!pip install paddlex -i https://mirror.baidu.com/pypi/simple"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "### 关于数据\n",
    "\n",
    "> 由于自己太懒了，所以本项目的数据集将数据集[垃圾分类图片](https://aistudio.baidu.com/aistudio/datasetdetail/35095)魔改成四类，生成四个文件夹放在数据集根目录下。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "> 大家可以跟着下面的操作指引，换成自己的数据集以及对自己的数据集进行数据增强"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# 解压数据集，放在work下使其可以永久保存（如果换数据集源记得更改下面的路径）\r\n",
    "!unzip -oq /home/aistudio/data/data104285/Waste.zip -d work/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "work/Waste\r\n",
      "├── FoodWaste\r\n",
      "├── HarmfulWaste\r\n",
      "├── OtherWaste\r\n",
      "└── RecyclableWaste\r\n",
      "\r\n",
      "4 directories, 0 files\r\n"
     ]
    }
   ],
   "source": [
    "# 查看数据集文件结构\r\n",
    "!tree work/Waste -L 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses\n",
      "  import imp\n",
      "Dataset Split Done.\u001b[0m\n",
      "\u001b[0mTrain samples: 10084\u001b[0m\n",
      "\u001b[0mEval samples: 2879\u001b[0m\n",
      "\u001b[0mTest samples: 1439\u001b[0m\n",
      "\u001b[0mSplit files saved in work/Waste\u001b[0m\n",
      "\u001b[0m\u001b[0m"
     ]
    }
   ],
   "source": [
    "# 使用PaddleX划分数据集\r\n",
    "# API说明 https://paddlex.readthedocs.io/zh_CN/release-1.3/data/annotation/classification.html\r\n",
    "# 下列参数dataset_dir后面可改成自己的数据集路径（绝对路径）\r\n",
    "!paddlex --split_dataset --format ImageNet --dataset_dir work/Waste --val_value 0.2 --test_value 0.1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2021-08-15 15:02:59 [INFO]\tStarting to read file list from dataset...\n",
      "2021-08-15 15:03:00 [INFO]\t10084 samples in file work/Waste/train_list.txt\n",
      "2021-08-15 15:03:00 [INFO]\tStarting to read file list from dataset...\n",
      "2021-08-15 15:03:00 [INFO]\t2879 samples in file work/Waste/val_list.txt\n"
     ]
    }
   ],
   "source": [
    "import paddlex as pdx\r\n",
    "from paddlex.cls import transforms\r\n",
    "# 数据增强：定义训练和验证时的transforms\r\n",
    "# API说明 https://paddlex.readthedocs.io/zh_CN/release-1.3/apis/transforms/cls_transforms.html\r\n",
    "train_transforms = transforms.Compose([\r\n",
    "    # 图像预处理代码：随机水平翻转、随机垂直翻转、随机旋转、标准化\r\n",
    "    transforms.RandomHorizontalFlip(),\r\n",
    "    transforms.RandomVerticalFlip(),\r\n",
    "    transforms.RandomRotate(),\r\n",
    "    transforms.Normalize()\r\n",
    "])\r\n",
    "\r\n",
    "eval_transforms = transforms.Compose([\r\n",
    "    transforms.RandomHorizontalFlip(),\r\n",
    "    transforms.RandomVerticalFlip(),\r\n",
    "    transforms.RandomRotate(),\r\n",
    "    transforms.Normalize()\r\n",
    "])\r\n",
    "\r\n",
    "# 数据集读取\r\n",
    "# API说明 https://paddlex.readthedocs.io/zh_CN/release-1.3/apis/datasets.html\r\n",
    "# path为数据集所在目录路径，可改成自己的数据集路径\r\n",
    "path = 'work/Waste'\r\n",
    "train_dataset = pdx.datasets.ImageNet(\r\n",
    "                    data_dir= path,\r\n",
    "                    file_list= path + '/train_list.txt',\r\n",
    "                    label_list= path + '/labels.txt',\r\n",
    "                    transforms=train_transforms)\r\n",
    "eval_dataset = pdx.datasets.ImageNet(\r\n",
    "                    data_dir= path,\r\n",
    "                    file_list= path + '/val_list.txt',\r\n",
    "                    label_list= path + '/labels.txt',\r\n",
    "                    transforms=eval_transforms)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## 三、模型选择\n",
    "\n",
    "这里选择移动端模型MobileNetV3作为开发模型\n",
    "\n",
    "> MobileNetV3是在MobileNetV2的基础上提出的网络，其计算量小、参数少，相比其他轻量级网络，依然取得了较好的成绩；同时以ResNeXt101_32x16d_wsl为teacher模型，运用SSLD（简单的半监督标签知识蒸馏）方式蒸馏出MobileNetV3_large模型，作为预训练模型；相对比原有的MobileNetV3预训练模型，在参数量不变的情况下，MobileNetV3_ssld预训练模型在ImageNet数据集上的精度提升3%，有助于用户进一步提升在自定义数据集上模型训练的效果。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# 设置使用0号GPU卡（如无GPU，执行此代码后仍然会使用CPU训练模型）\r\n",
    "import matplotlib\r\n",
    "matplotlib.use('Agg') \r\n",
    "import os\r\n",
    "os.environ['CUDA_VISIBLE_DEVICES'] = '0'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "### 模型训练\n",
    "> 注：`model.train()`每次运行只能使用一次，需要多次使用时得重启环境；设置`train_batch_size`默认是32，可以根据模型大小设置，但是设太大容易爆内存"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# API说明 https://paddlex.readthedocs.io/zh_CN/release-1.3/apis/models/classification.html\r\n",
    "num_classes = len(train_dataset.labels)\r\n",
    "# model可根据上面链接换成别的模型（注意修改模型参数和model.train的模型输出路径）\r\n",
    "model = pdx.cls.MobileNetV3_large_ssld(num_classes=num_classes)\r\n",
    "# model.train参数可参考上面的链接\r\n",
    "model.train(num_epochs=1,\r\n",
    "            train_dataset=train_dataset,\r\n",
    "            train_batch_size=16,\r\n",
    "            eval_dataset=eval_dataset,\r\n",
    "            lr_decay_epochs=[6, 8],\r\n",
    "            save_interval_epochs=1,\r\n",
    "            learning_rate=0.00625,\r\n",
    "            save_dir='output/mobilenetv3_large_ssld',\r\n",
    "            use_vdl=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## 四、效果展示\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2021-08-15 14:03:25 [INFO]\tModel[MobileNetV3_large_ssld] loaded.\n",
      "Predict Result: [{'category_id': 3, 'category': 'RecyclableWaste', 'score': 0.9998091}]\n"
     ]
    }
   ],
   "source": [
    "import paddlex as pdx\r\n",
    "# 模型载入（记得根据模型修改路径）\r\n",
    "model = pdx.load_model('output/mobilenetv3_large_ssld/best_model')\r\n",
    "# 使用数据集文件夹下test.txt中的一张图片进行预测，打印预测结果（需要根据数据集修改路径）\r\n",
    "image_name = 'work/Waste/RecyclableWaste/img_11129.jpg'\r\n",
    "result = model.predict(image_name)\r\n",
    "print(\"Predict Result:\", result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## 五、模型导出"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# 模型路径需要根据自己模型的输出路径进行更改\r\n",
    "!paddlex --export_inference --model_dir=output/mobilenetv3_large_ssld/best_model --save_dir=inference_model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## 六、补充说明\n",
    "\n",
    "* 此版本只是用PaddleX跑通了图像分类任务，关于模型优化和后续部署的部分有待更新。\n",
    "* 本项目参考PaddleX官方文档和官方AI Studio上的项目。\n",
    "* 关于使用PaddleX的感受，本人觉得这个工具很适合各位有创意的开发者进行快速开发，大家可以使用该工具完成自己的创意。\n",
    "* 关于本项目使用PaddleX遇到的问题多为内存溢出，大家可以尝试重启环境释放内存以解决问题\n",
    "* 大家如果运行此项目有什么问题欢迎评论交流=v="
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## 个人简介"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "> 作者：刘廷楠\n",
    "\n",
    "\n",
    "> 东北大学秦皇岛分校2019级车辆工程本科生\n",
    "\n",
    "\n",
    "> 感兴趣方向：计算机视觉、深度学习\n",
    "\n",
    "\n",
    "> 我在[AI Studio](https://aistudio.baidu.com/aistudio/personalcenter/thirdview/537248)上获得黄金等级，点亮8个徽章，来互关呀~ "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "PaddlePaddle 2.1.2 (Python 3.5)",
   "language": "python",
   "name": "py35-paddle1.2.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
